REVISED DAG ANALYSIS CODE: bad-controls-peer-bias-dag8

Author

Dan Swart


DAG RENDERING USING DiagrammeR.

There is no analysis with DiagrammeR, but analysis follows below.

Show the code
library(DiagrammeR)

grViz("
  digraph DAG {
    # Graph settings
    graph [layout=neato, margin=\"1.0, 1.0, 2.0, 1.0\", rankdir=TB, size=\"14,12\"]  
    
    # Add a title using a simple label approach
    labelloc=\"t\"
    label=\"Bad Controls:  Peer Bias\\n \\n\"      fontname=\"Cabin\" fontcolor=\"darkgreen\"
    fontsize=26
    
    # Node settings - make nodes larger with fontsize
    node [shape=plaintext, fontsize=26, fontname=\"Cabin\"] # Increase
    
    # Edge settings - make edges thicker and arrows larger
    edge [penwidth=4.0, color=\"darkblue\", arrowsize=1.5] # Increase
    
    # Nodes with exact coordinates
    X [label=\"X\", pos=\"1.0, 1.0!\", fontcolor=\"dodgerblue\"]
    Y [label=\"Y\", pos=\"4.0, 1.0!\", fontcolor=\"dodgerblue\"]
    E [label=\"E\", pos=\"2.5, 3.0!\", fontcolor=\"black\"]
    Q [label=\"Q\", pos=\"4.0, 3.0!\", fontcolor=\"darkpurple\"]
    
    
    # Edges
    X -> Y
    X -> E
    E -> Y
    Q -> Y
    Q -> E
    
    # Caption as a separate node at the bottom
    Caption [shape=plaintext, label=\"Cinelli, Forney, Pearl 2021 A Crash\\nCourse in Good and Bad Controls\", 
             fontsize=20, pos=\"2.5,0.0!\"]
  }
  ")


DAG Visualization using ggdag and dagitty

Show the code
# Define the DAG
peer_bias_dag8 <- ggdag::dagify(
  Y ~ X,   # Y is influenced by X
  E ~ X,
  Y ~ E,  
  E ~ Q,
  Y ~ Q,
  exposure = "X",
  outcome = "Y",
  # Add labels here:
  labels = c(X = "X", 
             Y = "Y", 
             E = "E",
             Q = "Q"),
  coords = list(x = c(X = 1.0, Y = 4.0, E = 2.5, Q = 4.0),  
                y = c(X = 1.0, Y = 1.0, E = 3.0, Q = 3.0))
)

# Create a nice visualization of the DAG
ggdag_status(peer_bias_dag8) + 
  theme_dag(base_size = 18) +
  labs(title = "Bad Controls: Peer Bias")

Cinelli, Forney, Pearl 2021 A Crash Course in Good and Bad Controls

Executive Summary: Peer Bias as a Bad Control

Peer bias occurs when we adjust for a variable E that is influenced by both the exposure X and an unmeasured confounder Q, which also affects the outcome Y. In this DAG structure, E is a mediator between X and Y, but is also affected by the confounder Q that directly affects Y.

Why is it a “Bad Control”?

Controlling for E in this structure is harmful because:

  1. It blocks part of the causal effect: By conditioning on E, we’re blocking the indirect effect of X on Y that flows through E.

  2. It opens a collider path: Conditioning on E opens a non-causal path between X and Y through Q (X → E ← Q → Y), potentially creating bias.

  3. It can distort the total effect: The adjustment might lead to estimates that don’t reflect the true causal relationship between X and Y.

Real-World Example

A researcher is studying the effect of a new teaching method (X) on student final exam scores (Y):

  • The teaching method (X) affects student engagement (E).
  • Student engagement (E) affects final exam scores (Y).
  • Student natural ability (Q) affects both engagement (E) and exam scores (Y).
  • The teaching method (X) also has a direct effect on exam scores (Y).

If the researcher controls for student engagement (E), they block the indirect effect of the teaching method (X) through engagement (E) and potentially introduce bias through the opened collider path.

How to Avoid Peer Bias

  1. Consider total effects carefully: Determine whether you’re interested in the total effect (direct + indirect) or just the direct effect of X on Y.

  2. Be cautious with mediators: Think carefully before adjusting for variables that lie on the causal pathway between exposure and outcome.

  3. Account for unmeasured confounders: Consider the possibility of unmeasured variables that might affect both your mediators and outcomes.

  4. Use appropriate causal inference methods: Methods like mediation analysis can help decompose direct and indirect effects properly.

Peer bias demonstrates the importance of carefully considering the causal structure before deciding which variables to control for in your analysis.


2. Results

2.1 Table of Key DAG Properties

Show the code
DT::datatable(
  properties_df,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    ordering = TRUE,
    searching = FALSE
  ),
  class = 'cell-border stripe'
)
Table 1: Key Properties of the DAG


2.2 Table of Conditional Independencies

Show the code
DT::datatable(
  independencies_df,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    ordering = TRUE,
    searching = FALSE
  ),
  class = 'cell-border stripe'
)


2.3 Table of Paths Between X and Y

Show the code
DT::datatable(
  paths_df,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    ordering = TRUE,
    searching = FALSE
  ),
  class = 'cell-border stripe'
)
Table 2: All Paths Between X and Y


2.4 Table of Ancestors and Descendants

Show the code
DT::datatable(
  ancestors_descendants_df,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    ordering = TRUE,
    searching = FALSE
  ),
  class = 'cell-border stripe'
)
Table 3: Ancestors and Descendants


2.5 Table of D-Separation Results

Show the code
DT::datatable(
  d_sep_df,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    ordering = TRUE,
    searching = FALSE
  ),
  class = 'cell-border stripe'
)
Table 4: D-Separation Test Results


2.6 Table of Impact of Adjustments

Show the code
DT::datatable(
  adjustment_effect_df,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    ordering = TRUE,
    searching = FALSE
  ),
  class = 'cell-border stripe'
)
Table 5: Effect of Different Adjustment Sets


2.7 Table of Instrumental Variables

Show the code
DT::datatable(
  instruments_df,
  rownames = FALSE,
  options = list(
    pageLength = 10,
    ordering = TRUE,
    searching = FALSE
  ),
  class = 'cell-border stripe'
)
Table 6: Potential Instrumental Variables


3. Visualizing Status, Adjustment Sets and Paths with ggdag

Show the code
# Create dagitty object with ggdag positioning
dag <- dagitty("dag {
  Y <- X
  E <- X
  Y <- E
  E <- Q
  Y <- Q
  
  X [exposure]
  Y [outcome]
}")

# Set coordinates for visualization in digatty format
dagitty::coordinates(dag) <- list(x = c(X = 1.0, Y = 4.0, E = 2.5, Q = 4.0),  
                         y = c(X = 1.0, Y = 1.0, E = 3.0, Q = 3.0)
)

# Convert to ggdag format
dag_tidy <- ggdag::tidy_dagitty(dag)

# Status plot showing exposure/outcome
ggdag_status(dag_tidy) +
  ggdag::theme_dag(base_size = 16) +
  ggplot2::labs(title = "Status Plot: Exposure and Outcome")

# Adjustment set visualization
ggdag::ggdag_adjustment_set(dag_tidy) +
  ggdag::theme_dag(base_size = 16) +
  ggplot2::labs(title = "Adjustment Sets for X → Y")

# Paths visualization
ggdag::ggdag_paths(dag_tidy) +
  ggdag::theme_dag(base_size = 16) +
  ggplot2::labs(title = "All Paths between X and Y")

Status Plot: Exposure and Outcome

Adjustment Sets for X → Y

All Paths between X and Y

Different visualizations of the DAG


4. Interpretation and Discussion

4.1 Key Insights about this Peer Bias DAG Structure

This DAG represents a causal network with a peer bias structure, examining the relationship between X and Y with E as a mediator and Q as an unmeasured confounder:

  1. Direct Causal Effect (X → Y)
    • X directly affects Y
    • This represents one component of the causal effect we’re interested in measuring
  2. Mediation Path (X → E → Y)
    • X affects E, which in turn affects Y
    • E is a mediator on the causal pathway from X to Y
    • This represents the indirect effect of X on Y through E
  3. Unmeasured Confounder (Q)
    • Q affects both E and Y
    • Q creates a backdoor path between E and Y
    • This introduces confounding in the mediator-outcome relationship
  4. Peer Bias
    • Adjusting for E while failing to adjust for Q can create bias
    • This happens because:
      • Conditioning on E blocks the indirect effect of X on Y through E
      • Conditioning on E opens a non-causal path between X and Y through Q (X → E ← Q → Y)
    • The resulting estimate may not reflect the total causal effect of X on Y

4.2 Proper Identification Strategy

To identify the causal effect of X on Y: - For the total effect of X on Y, do not adjust for E (the mediator) - If adjusting for E is necessary (e.g., to estimate the direct effect), also adjust for Q to block the opened collider path - If Q is unmeasured (as is often the case in real-world scenarios), consider: - Using sensitivity analysis to assess the potential impact of unmeasured confounding - Looking for proxy variables for Q - Using mediation analysis methods that can account for unmeasured confounding - The key insight is that adjusting for E without adjusting for Q leads to a biased estimate of the total causal effect


Glossary

Key DAG Terms and Concepts

DAG (Directed Acyclic Graph): A graphical representation of causal relationships where arrows indicate the direction of causality, and no variable can cause itself through any path (hence “acyclic”).

Exposure: The variable whose causal effect we want to estimate (often called the treatment or independent variable).

Outcome: The variable we are interested in measuring the effect on (often called the dependent variable).

Confounder: A variable that influences both the exposure and the outcome, potentially creating a spurious association between them.

Mediator: A variable that lies on the causal pathway between the exposure and outcome (exposure → mediator → outcome).

Collider: A variable that is influenced by both the exposure and the outcome, or by two variables on a path (e.g., A → C ← B).

Backdoor path: Any non-causal path connecting the exposure to the outcome that creates a spurious association.

Instrumental Variable: A variable that affects the exposure but has no direct effect on the outcome except through the exposure.

Peer Bias: A type of bias that occurs when we adjust for a variable E that is influenced by both the exposure X and an unmeasured confounder Q, which also affects the outcome Y. This can block mediation paths while opening collider paths.

Understanding the Analysis Tables

2. Conditional Independencies Table

Shows the implied conditional independencies in the DAG - pairs of variables that should be statistically independent when conditioning on specific other variables. These can be used to test the validity of your DAG against observed data.

3. Paths Analysis Table

Enumerates all paths connecting the exposure to the outcome:

  • Path: The specific variables and connections in each path
  • Length: Number of edges in the path
  • IsBackdoor: Whether this is a backdoor path (potential source of confounding)
  • IsDirected: Whether this is a directed path from exposure to outcome

Testing whether these paths are open or closed under different conditioning strategies is crucial for causal inference.

4. Ancestors and Descendants Table

Shows which variables can causally affect (ancestors) or be affected by (descendants) each variable in the DAG:

  • Understanding ancestry relationships helps identify potential confounders
  • X is an ancestor of E and Y, while Q is an ancestor of E and Y in this DAG

5. D-Separation Results Table

Shows whether exposure and outcome are conditionally independent (d-separated) when conditioning on different variable sets:

  • Is_D_Separated = Yes: This set of conditioning variables blocks all non-causal paths
  • Is_D_Separated = No: Some non-causal association remains

This helps identify sufficient adjustment sets for estimating causal effects.

6. Impact of Adjustments Table

Shows how different adjustment strategies affect the identification of causal effects:

  • Total_Paths: Total number of paths between exposure and outcome
  • Open_Paths: Number of paths that remain open after adjustment

Ideally, adjusting for the right variables leaves only the causal paths open.

7. Instrumental Variables Table

Lists potential instrumental variables - variables that affect the exposure but have no direct effect on the outcome except through the exposure.

How to Use This Analysis for Causal Inference

  1. Identify mediation effects: In this DAG, E is a mediator between X and Y. If you’re interested in the total effect of X on Y, don’t control for E.

  2. Be cautious with mediator adjustment: When adjusting for mediators like E, be aware that this can induce collider bias when unmeasured confounders like Q exist.

  3. Validate your DAG: Use the implied conditional independencies to test your causal assumptions against observed data.

  4. Consider unmeasured confounders: Always be aware of potential unmeasured confounders like Q and how they might affect your analysis, especially when adjusting for mediators.

  5. Choose appropriate analysis techniques: When dealing with mediation, consider using formal mediation analysis techniques rather than simple regression adjustment.

Remember that the validity of any causal inference depends on the correctness of your DAG - it represents your causal assumptions about the data-generating process, which should be based on substantive domain knowledge.

1. Key Properties Table

This table provides a high-level overview of the DAG structure and key causal features:

  • Acyclic DAG: Confirms the graph has no cycles (a prerequisite for valid causal analysis)
  • Causal effect identifiable: Indicates whether the causal effect can be estimated from observational data
  • Number of paths: Total number of paths connecting exposure and outcome
  • Number of backdoor paths: Paths creating potential confounding that need to be blocked
  • Direct effect exists: Whether there is a direct causal link from exposure to outcome
  • Potential mediators: Variables that may mediate the causal effect
  • Number of adjustment sets: How many different sets of variables could be adjusted for
  • Minimal adjustment sets: The smallest sets of variables that block all backdoor paths